-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: to_csv created corrupt ZIP files when chunksize<rows #38728
REGR: to_csv created corrupt ZIP files when chunksize<rows #38728
Conversation
pandas/io/common.py
Outdated
) | ||
self.multiple_write_buffer.write(data) | ||
|
||
def _write(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this actually be called .flush()
? or is that conflicting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about that, flush might be a simpler solution! I will test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to flush
, but we still have to overwrite close
. Some tests (weirdly not all tests) were failing without overwriting close
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ping on green.
thanks @twoertwein very nice |
@meeseeksdev backport 1.2.x |
…when chunksize<rows
Something went wrong ... Please have a look at my logs. |
…size<rows (#38767) Co-authored-by: Torsten Wörtwein <twoertwein@users.noreply.github.com>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
When
ZipFile
'swrite
is called multiple times, it will create multiple files within the zip file (with the same filename).Edit: This also happens independently of
chunksize
as https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/writers.pyx#L14 callswriterows
multiple times.